On the Average Cost Optimality Equation and the Structure of Optimal Policies for Partially Observable Markov Decision Processes*
نویسندگان
چکیده
We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with an uncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.
منابع مشابه
A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملOn Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP
We consider the average cost problem for partially observable Markov decision processes (POMDP) with finite state, observation, and control spaces. We prove that there exists an -optimal finite-state controller functionally independent of initial distributions for any > 0, under the assumption that the optimal liminf average cost function of the POMDP is constant. As part of our proof, we estab...
متن کاملOn the optimality equation for average cost Markov decision processes and its validity for inventory control
As is well known, average-cost optimality inequalities imply the existence of stationary optimal policies for Markov decision processes with average costs per unit time, and these inequalities hold under broad natural conditions. This paper provides sufficient conditions for the validity of the average-cost optimality equation for an infinite state problem with weakly continuous transition prob...
متن کاملOptimality Inequalities for Average Cost Markov Decision Processes and the Stochastic Cash Balance Problem
For general state and action space Markov decision processes, we present sufficient conditions for the existence of solutions of the average cost optimality inequalities. These conditions also imply the convergence of both the optimal discounted cost value function and policies to the corresponding objects for the average costs per unit time case. Inventory models are natural applications of ou...
متن کاملOptimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies
For general state and action space Markov decision processes, we present sufficient conditions for convergence of both the optimal discounted cost value function and policies to the corresponding objects for the average costs per unit time. We extend Schäl’s [24] assumptions, guaranteeing the existence of a solution to the average cost optimality inequalities for compact action sets, to non-com...
متن کامل